textual content
Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level
Graph Neural Networks (GNNs) excel across various applications but remain vulnerable to adversarial attacks, particularly Graph Injection Attacks (GIAs), which inject malicious nodes into the original graph and pose realistic threats.Text-attributed graphs (TAGs), where nodes are associated with textual features, are crucial due to their prevalence in real-world applications and are commonly used to evaluate these vulnerabilities.However, existing research only focuses on embedding-level GIAs, which inject node embeddings rather than actual textual content, limiting their applicability and simplifying detection.In this paper, we pioneer the exploration of GIAs at the text level, presenting three novel attack designs that inject textual content into the graph.Through theoretical and empirical analysis, we demonstrate that text interpretability, a factor previously overlooked at the embedding level, plays a crucial role in attack strength. Among the designs we investigate, the Word-frequency-based Text-level GIA (WTGIA) is particularly notable for its balance between performance and interpretability. Despite the success of WTGIA, we discover that defenders can easily enhance their defenses with customized text embedding methods or large language model (LLM)--based predictors. These insights underscore the necessity for further research into the potential and practical significance of text-level GIAs.
CrediBench: Building Web-Scale Network Datasets for Information Integrity
Kondrup, Emma, Sabry, Sebastian, Abdallah, Hussein, Yang, Zachary, Zhou, James, Pelrine, Kellin, Godbout, Jean-François, Bronstein, Michael M., Rabbany, Reihaneh, Huang, Shenyang
Online misinformation poses an escalating threat, amplified by the Internet's open nature and increasingly capable LLMs that generate persuasive yet deceptive content. Existing misinformation detection methods typically focus on either textual content or network structure in isolation, failing to leverage the rich, dynamic interplay between website content and hyperlink relationships that characterizes real-world misinformation ecosystems. We introduce CrediBench: a large-scale data processing pipeline for constructing temporal web graphs that jointly model textual content and hyperlink structure for misinformation detection. Unlike prior work, our approach captures the dynamic evolution of general misinformation domains, including changes in both content and inter-site references over time. Our processed one-month snapshot extracted from the Common Crawl archive in December 2024 contains 45 million nodes and 1 billion edges, representing the largest web graph dataset made publicly available for misinformation research to date. From our experiments on this graph snapshot, we demonstrate the strength of both structural and webpage content signals for learning credibility scores, which measure source reliability. The pipeline and experimentation code are all available here, and the dataset is in this folder.
- North America > Canada > Quebec > Montreal (0.14)
- Europe > Austria > Vienna (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- (24 more...)
- Media > News (1.00)
- Government > Regional Government > North America Government > United States Government (0.68)
Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level
Graph Neural Networks (GNNs) excel across various applications but remain vulnerable to adversarial attacks, particularly Graph Injection Attacks (GIAs), which inject malicious nodes into the original graph and pose realistic threats.Text-attributed graphs (TAGs), where nodes are associated with textual features, are crucial due to their prevalence in real-world applications and are commonly used to evaluate these vulnerabilities.However, existing research only focuses on embedding-level GIAs, which inject node embeddings rather than actual textual content, limiting their applicability and simplifying detection.In this paper, we pioneer the exploration of GIAs at the text level, presenting three novel attack designs that inject textual content into the graph.Through theoretical and empirical analysis, we demonstrate that text interpretability, a factor previously overlooked at the embedding level, plays a crucial role in attack strength. Among the designs we investigate, the Word-frequency-based Text-level GIA (WTGIA) is particularly notable for its balance between performance and interpretability. Despite the success of WTGIA, we discover that defenders can easily enhance their defenses with customized text embedding methods or large language model (LLM)--based predictors. These insights underscore the necessity for further research into the potential and practical significance of text-level GIAs.
Enhancing Large Vision-Language Models with Layout Modality for Table Question Answering on Japanese Annual Securities Reports
Aida, Hayato, Takahashi, Kosuke, Omi, Takahiro
--With recent advancements in Large Language Models (LLMs) and growing interest in retrieval-augmented generation (RAG), the ability to understand table structures has become increasingly important. This is especially critical in financial domains such as securities reports, where highly accurate question answering (QA) over tables is required. However, tables exist in various formats--including HTML, images, and plain text--making it difficult to preserve and extract structural information. Therefore, multimodal LLMs are essential for robust and general-purpose table understanding. Despite their promise, current Large Vision-Language Models (L VLMs), which are major representatives of multimodal LLMs, still face challenges in accurately understanding characters and their spatial relationships within documents. In this study, we propose a method to enhance L VLM-based table understanding by incorporating in-table textual content and layout features. Experimental results demonstrate that these auxiliary modalities significantly improve performance, enabling robust interpretation of complex document layouts without relying on explicitly structured input formats. The T ableCellQA dataset--including rendered images, layout (bounding-box) annotations, and QA data--will be publicly released upon publication (license confirmation in progress).
Integrating Emotion Distribution Networks and Textual Message Analysis for X User Emotional State Classification
Moradbeiki, Pardis, Chahooki, Mohammad Ali Zare
As the popularity and reach of social networks continue to surge, a vast reservoir of opinions and sentiments across various subjects inundates these platforms. Among these, X social network (formerly Twitter) stands as a juggernaut, boasting approximately 420 million active users. Extracting users' emotional and mental states from their expressed opinions on social media has become a common pursuit. While past methodologies predominantly focused on the textual content of messages to analyze user sentiment, the interactive nature of these platforms suggests a deeper complexity. This study employs hybrid methodologies, integrating textual analysis, profile examination, follower analysis, and emotion dissemination patterns. Initially, user interactions are leveraged to refine emotion classification within messages, encompassing exchanges where users respond to each other. Introducing the concept of a communication tree, a model is extracted to map these interactions. Subsequently, users' bios and interests from this tree are juxtaposed with message text to enrich analysis. Finally, influential figures are identified among users' followers in the communication tree, categorized into different topics to gauge interests. The study highlights that traditional sentiment analysis methodologies, focusing solely on textual content, are inadequate in discerning sentiment towards significant events, notably the presidential election. Comparative analysis with conventional methods reveals a substantial improvement in accuracy with the incorporation of emotion distribution patterns and user profiles. The proposed approach yields a 12% increase in accuracy with emotion distribution patterns and a 15% increase when considering user profiles, underscoring its efficacy in capturing nuanced sentiment dynamics.
- North America > United States > New York (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- Europe > Middle East > Malta > Port Region > Southern Harbour District > Valletta (0.04)
- (2 more...)
TAIJI: Textual Anchoring for Immunizing Jailbreak Images in Vision Language Models
Yin, Xiangyu, Qi, Yi, Hu, Jinwei, Chen, Zhen, Dong, Yi, Zhao, Xingyu, Huang, Xiaowei, Ruan, Wenjie
Vision Language Models (VLMs) have demonstrated impressive inference capabilities, but remain vulnerable to jailbreak attacks that can induce harmful or unethical responses. Existing defence methods are predominantly white-box approaches that require access to model parameters and extensive modifications, making them costly and impractical for many real-world scenarios. Although some black-box defences have been proposed, they often impose input constraints or require multiple queries, limiting their effectiveness in safety-critical tasks such as autonomous driving. To address these challenges, we propose a novel black-box defence framework called \textbf{T}extual \textbf{A}nchoring for \textbf{I}mmunizing \textbf{J}ailbreak \textbf{I}mages (\textbf{TAIJI}). TAIJI leverages key phrase-based textual anchoring to enhance the model's ability to assess and mitigate the harmful content embedded within both visual and textual prompts. Unlike existing methods, TAIJI operates effectively with a single query during inference, while preserving the VLM's performance on benign tasks. Extensive experiments demonstrate that TAIJI significantly enhances the safety and reliability of VLMs, providing a practical and efficient solution for real-world deployment.
- Transportation (0.75)
- Information Technology > Security & Privacy (0.46)
HeTGB: A Comprehensive Benchmark for Heterophilic Text-Attributed Graphs
Li, Shujie, Wu, Yuxia, Shi, Chuan, Fang, Yuan
Graph neural networks (GNNs) have demonstrated success in modeling relational data primarily under the assumption of homophily. However, many real-world graphs exhibit heterophily, where linked nodes belong to different categories or possess diverse attributes. Additionally, nodes in many domains are associated with textual descriptions, forming heterophilic text-attributed graphs (TAGs). Despite their significance, the study of heterophilic TAGs remains underexplored due to the lack of comprehensive benchmarks. To address this gap, we introduce the Heterophilic Text-attributed Graph Benchmark (HeTGB), a novel benchmark comprising five real-world heterophilic graph datasets from diverse domains, with nodes enriched by extensive textual descriptions. HeTGB enables systematic evaluation of GNNs, pre-trained language models (PLMs) and co-training methods on the node classification task. Through extensive benchmarking experiments, we showcase the utility of text attributes in heterophilic graphs, analyze the challenges posed by heterophilic TAGs and the limitations of existing models, and provide insights into the interplay between graph structures and textual attributes. We have publicly released HeTGB with baseline implementations to facilitate further research in this field.
- North America > United States (0.95)
- Asia > Singapore (0.14)
- Asia > China (0.14)
MMCFND: Multimodal Multilingual Caption-aware Fake News Detection for Low-resource Indic Languages
Bansal, Shubhi, Singh, Nishit Sushil, Dar, Shahid Shafi, Kumar, Nagendra
The widespread dissemination of false information through manipulative tactics that combine deceptive text and images threatens the integrity of reliable sources of information. While there has been research on detecting fake news in high resource languages using multimodal approaches, methods for low resource Indic languages primarily rely on textual analysis. This difference highlights the need for robust methods that specifically address multimodal fake news in Indic languages, where the lack of extensive datasets and tools presents a significant obstacle to progress. To this end, we introduce the Multimodal Multilingual dataset for Indic Fake News Detection (MMIFND). This meticulously curated dataset consists of 28,085 instances distributed across Hindi, Bengali, Marathi, Malayalam, Tamil, Gujarati and Punjabi. We further propose the Multimodal Multilingual Caption-aware framework for Fake News Detection (MMCFND). MMCFND utilizes pre-trained unimodal encoders and pairwise encoders from a foundational model that aligns vision and language, allowing for extracting deep representations from visual and textual components of news articles. The multimodal fusion encoder in the foundational model integrates text and image representations derived from its pairwise encoders to generate a comprehensive cross modal representation. Furthermore, we generate descriptive image captions that provide additional context to detect inconsistencies and manipulations. The retrieved features are then fused and fed into a classifier to determine the authenticity of news articles. The curated dataset can potentially accelerate research and development in low resource environments significantly. Thorough experimentation on MMIFND demonstrates that our proposed framework outperforms established methods for extracting relevant fake news detection features.
- Asia > India (0.05)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
What Makes a Meme a Meme? Identifying Memes for Memetics-Aware Dataset Creation
Hazman, Muzhaffar, McKeever, Susan, Griffith, Josephine
Warning: This paper contains memes that may be offensive to some readers. Multimodal Internet Memes are now a ubiquitous fixture in online discourse. One strand of meme-based research is the classification of memes according to various affects, such as sentiment and hate, supported by manually compiled meme datasets. Understanding the unique characteristics of memes is crucial for meme classification. Unlike other user-generated content, memes spread via memetics, i.e. the process by which memes are imitated and transformed into symbols used to create new memes. In effect, there exists an ever-evolving pool of visual and linguistic symbols that underpin meme culture and are crucial to interpreting the meaning of individual memes. The current approach of training supervised learning models on static datasets, without taking memetics into account, limits the depth and accuracy of meme interpretation. We argue that meme datasets must contain genuine memes, as defined via memetics, so that effective meme classifiers can be built. In this work, we develop a meme identification protocol which distinguishes meme from non-memetic content by recognising the memetics within it. We apply our protocol to random samplings of the leading 7 meme classification datasets and observe that more than half (50. 4\%) of the evaluated samples were found to contain no signs of memetics. Our work also provides a meme typology grounded in memetics, providing the basis for more effective approaches to the interpretation of memes and the creation of meme datasets.
- North America > United States > New York > New York County > New York City (0.04)
- Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- (8 more...)
DocSynthv2: A Practical Autoregressive Modeling for Document Generation
Biswas, Sanket, Jain, Rajiv, Morariu, Vlad I., Gu, Jiuxiang, Mathur, Puneet, Wigington, Curtis, Sun, Tong, Lladós, Josep
While the generation of document layouts has been extensively explored, comprehensive document generation encompassing both layout and content presents a more complex challenge. This paper delves into this advanced domain, proposing a novel approach called DocSynthv2 through the development of a simple yet effective autoregressive structured model. Our model, distinct in its integration of both layout and textual cues, marks a step beyond existing layout-generation approaches. By focusing on the relationship between the structural elements and the textual content within documents, we aim to generate cohesive and contextually relevant documents without any reliance on visual components. Through experimental studies on our curated benchmark for the new task, we demonstrate the ability of our model combining layout and textual information in enhancing the generation quality and relevance of documents, opening new pathways for research in document creation and automated design. Our findings emphasize the effectiveness of autoregressive models in handling complex document generation tasks.